<!DOCTYPE html>
<html prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article# " lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>using-pipelines-for-multiple-preprocessing-steps | 绿萝间</title>
<link href="../assets/css/all-nocdn.css" rel="stylesheet" type="text/css">
<link href="../assets/css/ipython.min.css" rel="stylesheet" type="text/css">
<link href="../assets/css/nikola_ipython.css" rel="stylesheet" type="text/css">
<meta name="theme-color" content="#5670d4">
<meta name="generator" content="Nikola (getnikola.com)">
<link rel="alternate" type="application/rss+xml" title="RSS" href="../rss.xml">
<link rel="canonical" href="https://muxuezi.github.io/posts/using-pipelines-for-multiple-preprocessing-steps.html">
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
    tex2jax: {
        inlineMath: [ ['$','$'], ["\\(","\\)"] ],
        displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
        processEscapes: true
    },
    displayAlign: 'center', // Change this to 'center' to center equations.
    "HTML-CSS": {
        styles: {'.MathJax_Display': {"margin": 0}}
    }
});
</script><!--[if lt IE 9]><script src="../assets/js/html5.js"></script><![endif]--><meta name="author" content="Tao Junjie">
<link rel="prev" href="using-gaussian-processes-for-regression.html" title="using-gaussian-processes-for-regression" type="text/html">
<link rel="next" href="using-stochastic-gradient-descent-for-regression.html" title="using-stochastic-gradient-descent-for-regression" type="text/html">
<meta property="og:site_name" content="绿萝间">
<meta property="og:title" content="using-pipelines-for-multiple-preprocessing-steps">
<meta property="og:url" content="https://muxuezi.github.io/posts/using-pipelines-for-multiple-preprocessing-steps.html">
<meta property="og:description" content="用管线命令处理多个步骤¶








管线命令不经常用，但是很有用。它们可以把多个步骤组合成一个对象执行。这样可以更方便灵活地调节和控制整个模型的配置，而不只是一个一个步骤调节。









Getting ready¶








这是我们把多个数据处理步骤组合成一个对象的第一部分。在scikit-learn里称为pipeline。这里我们首先通过计算处理缺失值；然后将数据集调整为">
<meta property="og:type" content="article">
<meta property="article:published_time" content="2015-07-27T14:58:57+08:00">
<meta property="article:tag" content="CHS">
<meta property="article:tag" content="ipython">
<meta property="article:tag" content="Machine Learning">
<meta property="article:tag" content="Python">
<meta property="article:tag" content="scikit-learn cookbook">
</head>
<body>
<a href="#content" class="sr-only sr-only-focusable">Skip to main content</a>

<!-- Menubar -->

<nav class="navbar navbar-inverse navbar-static-top"><div class="container">
<!-- This keeps the margins nice -->
        <div class="navbar-header">
            <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-navbar" aria-controls="bs-navbar" aria-expanded="false">
            <span class="sr-only">Toggle navigation</span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            </button>
            <a class="navbar-brand" href="https://muxuezi.github.io/">

                <span id="blog-title">绿萝间</span>
            </a>
        </div>
<!-- /.navbar-header -->
        <div class="collapse navbar-collapse" id="bs-navbar" aria-expanded="false">
            <ul class="nav navbar-nav">
<li>
<a href="../archive.html">Archive</a>
                </li>
<li>
<a href="../categories/">Tags</a>
                </li>
<li>
<a href="../rss.xml">RSS feed</a>

                
            </li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
    <a href="using-pipelines-for-multiple-preprocessing-steps.ipynb" id="sourcelink">Source</a>
    </li>

                
            </ul>
</div>
<!-- /.navbar-collapse -->
    </div>
<!-- /.container -->
</nav><!-- End of Menubar --><div class="container" id="content" role="main">
    <div class="body-content">
        <!--Body content-->
        <div class="row">
            
            
<article class="post-text h-entry hentry postpage" itemscope="itemscope" itemtype="http://schema.org/Article"><header><h1 class="p-name entry-title" itemprop="headline name"><a href="#" class="u-url">using-pipelines-for-multiple-preprocessing-steps</a></h1>

        <div class="metadata">
            <p class="byline author vcard"><span class="byline-name fn">
                    Tao Junjie
            </span></p>
            <p class="dateline"><a href="#" rel="bookmark"><time class="published dt-published" datetime="2015-07-27T14:58:57+08:00" itemprop="datePublished" title="2015-07-27 14:58">2015-07-27 14:58</time></a></p>
            
        <p class="sourceline"><a href="using-pipelines-for-multiple-preprocessing-steps.ipynb" id="sourcelink">Source</a></p>

        </div>
        

    </header><div class="e-content entry-content" itemprop="articleBody text">
    <div tabindex="-1" id="notebook" class="border-box-sizing">
    <div class="container" id="notebook-container">

<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="用管线命令处理多个步骤">用管线命令处理多个步骤<a class="anchor-link" href="using-pipelines-for-multiple-preprocessing-steps.html#%E7%94%A8%E7%AE%A1%E7%BA%BF%E5%91%BD%E4%BB%A4%E5%A4%84%E7%90%86%E5%A4%9A%E4%B8%AA%E6%AD%A5%E9%AA%A4">¶</a>
</h2>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>管线命令不经常用，但是很有用。它们可以把多个步骤组合成一个对象执行。这样可以更方便灵活地调节和控制整个模型的配置，而不只是一个一个步骤调节。</p>
<!-- TEASER_END -->
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="Getting-ready">Getting ready<a class="anchor-link" href="using-pipelines-for-multiple-preprocessing-steps.html#Getting-ready">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>这是我们把多个数据处理步骤组合成一个对象的第一部分。在scikit-learn里称为<code>pipeline</code>。这里我们首先通过计算处理缺失值；然后将数据集调整为均值为0，标准差为1的标准形。</p>
<p>让我们创建一个有缺失值的数据集，然后再演示<code>pipeline</code>的用法：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [1]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn</span> <span class="k">import</span> <span class="n">datasets</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="n">mat</span> <span class="o">=</span> <span class="n">datasets</span><span class="o">.</span><span class="n">make_spd_matrix</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="n">masking_array</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">binomial</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">.</span><span class="mi">1</span><span class="p">,</span> <span class="n">mat</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">bool</span><span class="p">)</span>
<span class="n">mat</span><span class="p">[</span><span class="n">masking_array</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">nan</span>
<span class="n">mat</span><span class="p">[:</span><span class="mi">4</span><span class="p">,</span> <span class="p">:</span><span class="mi">4</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[1]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 1.05419595,  1.42287309,  0.02368264, -0.8505244 ],
       [ 1.42287309,  5.09704588,         nan, -2.46408728],
       [ 0.02368264,  0.03614203,  0.63317494,  0.09792298],
       [-0.8505244 , -2.46408728,  0.09792298,  2.04110849]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="How-to-do-it...">How to do it...<a class="anchor-link" href="using-pipelines-for-multiple-preprocessing-steps.html#How-to-do-it...">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>如果不用管线命令，我们可能会这样实现：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [2]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn</span> <span class="k">import</span> <span class="n">preprocessing</span>
<span class="n">impute</span> <span class="o">=</span> <span class="n">preprocessing</span><span class="o">.</span><span class="n">Imputer</span><span class="p">()</span>
<span class="n">scaler</span> <span class="o">=</span> <span class="n">preprocessing</span><span class="o">.</span><span class="n">StandardScaler</span><span class="p">()</span>
<span class="n">mat_imputed</span> <span class="o">=</span> <span class="n">impute</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">mat</span><span class="p">)</span>
<span class="n">mat_imputed</span><span class="p">[:</span><span class="mi">4</span><span class="p">,</span> <span class="p">:</span><span class="mi">4</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[2]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 1.05419595,  1.42287309,  0.02368264, -0.8505244 ],
       [ 1.42287309,  5.09704588,  0.09560571, -2.46408728],
       [ 0.02368264,  0.03614203,  0.63317494,  0.09792298],
       [-0.8505244 , -2.46408728,  0.09792298,  2.04110849]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [5]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">mat_imp_and_scaled</span> <span class="o">=</span> <span class="n">scaler</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">mat_imputed</span><span class="p">)</span>
<span class="n">mat_imp_and_scaled</span><span class="p">[:</span><span class="mi">4</span><span class="p">,</span> <span class="p">:</span><span class="mi">4</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[5]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[  1.09907483e+00,   2.62635324e-01,  -3.88958755e-01,
         -4.80451718e-01],
       [  1.63825210e+00,   2.01707858e+00,  -7.50508486e-17,
         -1.80311396e+00],
       [ -4.08014393e-01,  -3.99538476e-01,   2.90716556e+00,
          2.97005140e-01],
       [ -1.68651124e+00,  -1.59341549e+00,   1.25317595e-02,
          1.88986410e+00]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>现在我们用<code>pipeline</code>来演示：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [6]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn</span> <span class="k">import</span> <span class="n">pipeline</span>
<span class="n">pipe</span> <span class="o">=</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">([(</span><span class="s1">'impute'</span><span class="p">,</span> <span class="n">impute</span><span class="p">),</span> <span class="p">(</span><span class="s1">'scaler'</span><span class="p">,</span> <span class="n">scaler</span><span class="p">)])</span>
</pre></div>

</div>
</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>我们看看<code>pipe</code>的内容。和前面介绍一致，管线命令定义了处理步骤：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [7]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">pipe</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[7]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>Pipeline(steps=[('impute', Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)), ('scaler', StandardScaler(copy=True, with_mean=True, with_std=True))])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>然后在调用<code>pipe</code>的<code>fit_transform</code>方法，就可以把多个步骤组合成一个对象了：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [10]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">new_mat</span> <span class="o">=</span> <span class="n">pipe</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">mat</span><span class="p">)</span>
<span class="n">new_mat</span><span class="p">[:</span><span class="mi">4</span><span class="p">,</span> <span class="p">:</span><span class="mi">4</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[10]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[  1.09907483e+00,   2.62635324e-01,  -3.88958755e-01,
         -4.80451718e-01],
       [  1.63825210e+00,   2.01707858e+00,  -7.50508486e-17,
         -1.80311396e+00],
       [ -4.08014393e-01,  -3.99538476e-01,   2.90716556e+00,
          2.97005140e-01],
       [ -1.68651124e+00,  -1.59341549e+00,   1.25317595e-02,
          1.88986410e+00]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>可以用Numpy验证一下结果：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [11]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">array_equal</span><span class="p">(</span><span class="n">new_mat</span><span class="p">,</span> <span class="n">mat_imp_and_scaled</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[11]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>True</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>完全正确！本书后面的主题中，我们会进一步展示管线命令的威力。不仅可以用于预处理步骤中，在降维、算法拟合中也可以很方便的使用。</p>

</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="How-it-works...">How it works...<a class="anchor-link" href="using-pipelines-for-multiple-preprocessing-steps.html#How-it-works...">¶</a>
</h3>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>前面曾经提到过，每个scikit-learn的算法接口都类似。<code>pipeline</code>最重要的函数也不外乎下面三个：</p>
<ul>
<li><code>fit</code></li>
<li><code>transform</code></li>
<li><code>fit_transform</code></li>
</ul>
<p>具体来说，如果管线命令有<code>N</code>个对象，前<code>N-1</code>个对象必须实现<code>fit</code>和<code>transform</code>，第<code>N</code>个对象至少实现<code>fit</code>。否则就会出现错误。</p>
<p>如果这些条件满足，管线命令就会运行，但是不一定每个方法都可以。例如，<code>pipe</code>有个<code>inverse_transform</code>方法就是这样。因为由于计算步骤没有<code>inverse_transform</code>方法，一运行就有错误：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [12]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">pipe</span><span class="o">.</span><span class="n">inverse_transform</span><span class="p">(</span><span class="n">new_mat</span><span class="p">)</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt"></div>
<div class="output_subarea output_text output_error">
<pre>
<span class="ansi-red-intense-fg ansi-bold">---------------------------------------------------------------------------</span>
<span class="ansi-red-intense-fg ansi-bold">AttributeError</span>                            Traceback (most recent call last)
<span class="ansi-green-intense-fg ansi-bold">&lt;ipython-input-12-62edd2667cae&gt;</span> in <span class="ansi-cyan-fg">&lt;module&gt;</span><span class="ansi-blue-intense-fg ansi-bold">()</span>
<span class="ansi-green-intense-fg ansi-bold">----&gt; 1</span><span class="ansi-yellow-intense-fg ansi-bold"> </span>pipe<span class="ansi-yellow-intense-fg ansi-bold">.</span>inverse_transform<span class="ansi-yellow-intense-fg ansi-bold">(</span>new_mat<span class="ansi-yellow-intense-fg ansi-bold">)</span>

<span class="ansi-green-intense-fg ansi-bold">d:\programfiles\Miniconda3\lib\site-packages\sklearn\utils\metaestimators.py</span> in <span class="ansi-cyan-fg">&lt;lambda&gt;</span><span class="ansi-blue-intense-fg ansi-bold">(*args, **kwargs)</span>
<span class="ansi-green-fg">     35</span>             self<span class="ansi-yellow-intense-fg ansi-bold">.</span>get_attribute<span class="ansi-yellow-intense-fg ansi-bold">(</span>obj<span class="ansi-yellow-intense-fg ansi-bold">)</span>
<span class="ansi-green-fg">     36</span>         <span class="ansi-red-intense-fg ansi-bold"># lambda, but not partial, allows help() to work with update_wrapper</span>
<span class="ansi-green-intense-fg ansi-bold">---&gt; 37</span><span class="ansi-yellow-intense-fg ansi-bold">         </span>out <span class="ansi-yellow-intense-fg ansi-bold">=</span> <span class="ansi-green-intense-fg ansi-bold">lambda</span> <span class="ansi-yellow-intense-fg ansi-bold">*</span>args<span class="ansi-yellow-intense-fg ansi-bold">,</span> <span class="ansi-yellow-intense-fg ansi-bold">**</span>kwargs<span class="ansi-yellow-intense-fg ansi-bold">:</span> self<span class="ansi-yellow-intense-fg ansi-bold">.</span>fn<span class="ansi-yellow-intense-fg ansi-bold">(</span>obj<span class="ansi-yellow-intense-fg ansi-bold">,</span> <span class="ansi-yellow-intense-fg ansi-bold">*</span>args<span class="ansi-yellow-intense-fg ansi-bold">,</span> <span class="ansi-yellow-intense-fg ansi-bold">**</span>kwargs<span class="ansi-yellow-intense-fg ansi-bold">)</span>
<span class="ansi-green-fg">     38</span>         <span class="ansi-red-intense-fg ansi-bold"># update the docstring of the returned function</span>
<span class="ansi-green-fg">     39</span>         update_wrapper<span class="ansi-yellow-intense-fg ansi-bold">(</span>out<span class="ansi-yellow-intense-fg ansi-bold">,</span> self<span class="ansi-yellow-intense-fg ansi-bold">.</span>fn<span class="ansi-yellow-intense-fg ansi-bold">)</span>

<span class="ansi-green-intense-fg ansi-bold">d:\programfiles\Miniconda3\lib\site-packages\sklearn\pipeline.py</span> in <span class="ansi-cyan-fg">inverse_transform</span><span class="ansi-blue-intense-fg ansi-bold">(self, X)</span>
<span class="ansi-green-fg">    265</span>         Xt <span class="ansi-yellow-intense-fg ansi-bold">=</span> X
<span class="ansi-green-fg">    266</span>         <span class="ansi-green-intense-fg ansi-bold">for</span> name<span class="ansi-yellow-intense-fg ansi-bold">,</span> step <span class="ansi-green-intense-fg ansi-bold">in</span> self<span class="ansi-yellow-intense-fg ansi-bold">.</span>steps<span class="ansi-yellow-intense-fg ansi-bold">[</span><span class="ansi-yellow-intense-fg ansi-bold">:</span><span class="ansi-yellow-intense-fg ansi-bold">:</span><span class="ansi-yellow-intense-fg ansi-bold">-</span><span class="ansi-cyan-intense-fg ansi-bold">1</span><span class="ansi-yellow-intense-fg ansi-bold">]</span><span class="ansi-yellow-intense-fg ansi-bold">:</span>
<span class="ansi-green-intense-fg ansi-bold">--&gt; 267</span><span class="ansi-yellow-intense-fg ansi-bold">             </span>Xt <span class="ansi-yellow-intense-fg ansi-bold">=</span> step<span class="ansi-yellow-intense-fg ansi-bold">.</span>inverse_transform<span class="ansi-yellow-intense-fg ansi-bold">(</span>Xt<span class="ansi-yellow-intense-fg ansi-bold">)</span>
<span class="ansi-green-fg">    268</span>         <span class="ansi-green-intense-fg ansi-bold">return</span> Xt
<span class="ansi-green-fg">    269</span> 

<span class="ansi-red-intense-fg ansi-bold">AttributeError</span>: 'Imputer' object has no attribute 'inverse_transform'</pre>
</div>
</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>但是，<code>scalar</code>对象可以正常运行：</p>

</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [13]:</div>
<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="n">scaler</span><span class="o">.</span><span class="n">inverse_transform</span><span class="p">(</span><span class="n">new_mat</span><span class="p">)[:</span><span class="mi">4</span><span class="p">,</span> <span class="p">:</span><span class="mi">4</span><span class="p">]</span>
</pre></div>

</div>
</div>
</div>

<div class="output_wrapper">
<div class="output">


<div class="output_area">
<div class="prompt output_prompt">Out[13]:</div>


<div class="output_text output_subarea output_execute_result">
<pre>array([[ 1.05419595,  1.42287309,  0.02368264, -0.8505244 ],
       [ 1.42287309,  5.09704588,  0.09560571, -2.46408728],
       [ 0.02368264,  0.03614203,  0.63317494,  0.09792298],
       [-0.8505244 , -2.46408728,  0.09792298,  2.04110849]])</pre>
</div>

</div>

</div>
</div>

</div>
<div class="cell border-box-sizing text_cell rendered">
<div class="prompt input_prompt">
</div>
<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>只要把管线命令设置好，它就会如愿运行。它就是一组<code>for</code>循环，对每个步骤执行<code>fit</code>和<code>transform</code>，然后把结果传递到下一个变换操作中。</p>
<p>使用管线命令的理由主要有两点：</p>
<ul>
<li>首先是方便。代码会简洁一些，不需要重复调用<code>fit</code>和<code>transform</code>。</li>
<li>其次，也是更重要的作用，就是使用交叉验证。模型可以变得很复杂。如果管线命令中的一个步骤调整了参数，那么它们必然需要重新测试；测试一个步骤参数的代码管理成本是很低的。但是，如果测试5个步骤的全部参数会变都很复杂。管线命令可以缓解这些负担。</li>
</ul>
</div>
</div>
</div>
    </div>
  </div>

    </div>
    <aside class="postpromonav"><nav><ul itemprop="keywords" class="tags">
<li><a class="tag p-category" href="../categories/chs.html" rel="tag">CHS</a></li>
            <li><a class="tag p-category" href="../categories/ipython.html" rel="tag">ipython</a></li>
            <li><a class="tag p-category" href="../categories/machine-learning.html" rel="tag">Machine Learning</a></li>
            <li><a class="tag p-category" href="../categories/python.html" rel="tag">Python</a></li>
            <li><a class="tag p-category" href="../categories/scikit-learn-cookbook.html" rel="tag">scikit-learn cookbook</a></li>
        </ul>
<ul class="pager hidden-print">
<li class="previous">
                <a href="using-gaussian-processes-for-regression.html" rel="prev" title="using-gaussian-processes-for-regression">Previous post</a>
            </li>
            <li class="next">
                <a href="using-stochastic-gradient-descent-for-regression.html" rel="next" title="using-stochastic-gradient-descent-for-regression">Next post</a>
            </li>
        </ul></nav></aside><script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script><script type="text/x-mathjax-config">
MathJax.Hub.Config({
    tex2jax: {
        inlineMath: [ ['$','$'], ["\\(","\\)"] ],
        displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
        processEscapes: true
    },
    displayAlign: 'center', // Change this to 'center' to center equations.
    "HTML-CSS": {
        styles: {'.MathJax_Display': {"margin": 0}}
    }
});
</script></article>
</div>
        <!--End of body content-->

        <footer id="footer">
            Contents © 2017         <a href="mailto:muxuezi@gmail.com">Tao Junjie</a> - Powered by         <a href="https://getnikola.com" rel="nofollow">Nikola</a>         
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0">
<img alt="Creative Commons License BY-NC-SA" style="border-width:0; margin-bottom:12px;" src="http://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png"></a>
            
        </footer>
</div>
</div>


            <script src="../assets/js/all-nocdn.js"></script><script>$('a.image-reference:not(.islink) img:not(.islink)').parent().colorbox({rel:"gal",maxWidth:"100%",maxHeight:"100%",scalePhotos:true});</script><!-- fancy dates --><script>
    moment.locale("en");
    fancydates(0, "YYYY-MM-DD HH:mm");
    </script><!-- end fancy dates --><script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-51330059-1', 'auto');
  ga('send', 'pageview');

</script>
</body>
</html>
